169 research outputs found
PRES: A score metric for evaluating recall-oriented information retrieval applications
Information retrieval (IR) evaluation scores are generally
designed to measure the effectiveness with which relevant
documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user’s search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the
retrieval effectiveness of systems from a recall focused
perspective taking into account the user’s expected search effort
Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task
We present our experiments and results for the DCU CNGL
participation in the CLEF-IP 2010 Candidate Patent Search Task. Our work applied standard information retrieval (IR) techniques to patent search. In addition, a very simple citation extraction method was applied to improve the
results. This was our second consecutive participation in the CLEF-IP tasks. Our experiments in 2009 showed that many sophisticated approach to IR do not improve the retrieval effectiveness for this task. For this reason of we decided
to apply only simple methods in 2010. These were demonstrated to be highly competitive with other participants. DCU submitted three runs for the Prior Art
Candidate Search Task, two of these runs achieved the second and third ranks among the 25 runs submitted by nine different participants. Our best run achieved MAP of 0.203, recall of 0.618, and PRES of 0.523
A study of query expansion methods for patent retrieval
Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has
tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed
effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over
the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail
A new metric for patent retrieval evaluation
Patent retrieval is generally considered to be a recall-oriented information retrieval task that is growing in importance. Despite this fact, precision based scores such as mean average precision (MAP) remain the primary evaluation measures for patent retrieval. Our study examines different evaluation measures for the recall-oriented patent retrieval task and shows the limitations
of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and user search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the expected search effort of patent searchers
Toward higher effectiveness for recall-oriented information retrieval: A patent retrieval case study
Research in information retrieval (IR) has largely been directed towards tasks requiring high precision. Recently, other IR applications which can be described as recall-oriented IR tasks have received increased attention in the IR research domain. Prominent among these IR applications are patent search and legal search, where users are typically ready to check hundreds or possibly thousands of documents in order to find any possible relevant document. The main concerns in this kind of application are very different from those in standard precision-oriented IR tasks, where users tend to be focused on finding an answer to their information need that can typically be addressed by one or two relevant documents. For precision-oriented tasks, mean average precision continues to be used as the primary evaluation metric for almost all IR applications. For recall-oriented IR applications the nature of the search task, including objectives, users, queries, and document collections, is different from that of standard precision-oriented search tasks. In this research study, two dimensions in IR are explored for the recall-oriented patent search task. The study includes IR system evaluation and multilingual IR for patent search. In each of these dimensions, current IR techniques are studied and novel techniques developed especially for this kind of recall-oriented IR application are proposed and investigated experimentally in the context of patent retrieval. The techniques developed in this thesis provide a significant contribution toward evaluating the effectiveness of recall-oriented IR in general and particularly patent search, and improving the efficiency of multilingual search for this kind of task
Simple vs. sophisticated approaches for patent prior-art search
Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided
Your Stance is Exposed! Analysing Possible Factors for Stance Detection on Social Media
To what extent user's stance towards a given topic could be inferred? Most of
the studies on stance detection have focused on analysing user's posts on a
given topic to predict the stance. However, the stance in social media can be
inferred from a mixture of signals that might reflect user's beliefs including
posts and online interactions. This paper examines various online features of
users to detect their stance towards different topics. We compare multiple set
of features, including on-topic content, network interactions, user's
preferences, and online network connections. Our objective is to understand the
online signals that can reveal the users' stance. Experimentation is applied on
tweets dataset from the SemEval stance detection task, which covers five
topics. Results show that stance of a user can be detected with multiple
signals of user's online activity, including their posts on the topic, the
network they interact with or follow, the websites they visit, and the content
they like. The performance of the stance modelling using different network
features are comparable with the state-of-the-art reported model that used
textual content only. In addition, combining network and content features leads
to the highest reported performance to date on the SemEval dataset with
F-measure of 72.49%. We further present an extensive analysis to show how these
different set of features can reveal stance. Our findings have distinct privacy
implications, where they highlight that stance is strongly embedded in user's
online social network that, in principle, individuals can be profiled from
their interactions and connections even when they do not post about the topic.Comment: Accepted as a full paper at CSCW 2019. Please cite the CSCW versio
- …